Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 13131252 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 2.3 GiB |
| Average record size in memory | 190.0 B |
Variable types
| NUM | 8 |
|---|---|
| CAT | 2 |
| BOOL | 1 |
clv is highly correlated with transaction_value and 1 other fields | High correlation |
transaction_value is highly correlated with clv and 1 other fields | High correlation |
earned_reward_points is highly correlated with transaction_value and 2 other fields | High correlation |
total_reward_points is highly correlated with earned_reward_points | High correlation |
referred_friends has 5010003 (38.2%) zeros | Zeros |
Reproduction
| Analysis started | 2020-11-18 17:25:50.572509 |
|---|---|
| Analysis finished | 2020-11-18 17:42:30.181968 |
| Duration | 16 minutes and 39.61 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
customer_id
Real number (ℝ≥0)
| Distinct | 924342 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 746300.5984 |
|---|---|
| Minimum | 2 |
| Maximum | 1423208 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 100.2 MiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 135089 |
| Q1 | 514503 |
| median | 770364 |
| Q3 | 999609 |
| 95-th percentile | 1263389 |
| Maximum | 1423208 |
| Range | 1423206 |
| Interquartile range (IQR) | 485106 |
Descriptive statistics
| Standard deviation | 335268.1639 |
|---|---|
| Coefficient of variation (CV) | 0.4492401113 |
| Kurtosis | -0.6823767032 |
| Mean | 746300.5984 |
| Median Absolute Deviation (MAD) | 240714 |
| Skewness | -0.2604560407 |
| Sum | 9.799861226e+12 |
| Variance | 1.124047417e+11 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 478997 | 45 | < 0.1% | |
| 521105 | 45 | < 0.1% | |
| 610906 | 45 | < 0.1% | |
| 635462 | 45 | < 0.1% | |
| 316405 | 45 | < 0.1% | |
| 275433 | 45 | < 0.1% | |
| 349125 | 45 | < 0.1% | |
| 496541 | 45 | < 0.1% | |
| 512917 | 45 | < 0.1% | |
| 471945 | 45 | < 0.1% | |
| 700972 | 45 | < 0.1% | |
| 488321 | 45 | < 0.1% | |
| 127825 | 45 | < 0.1% | |
| 135981 | 45 | < 0.1% | |
| 569977 | 45 | < 0.1% | |
| 627293 | 45 | < 0.1% | |
| 635481 | 45 | < 0.1% | |
| 651857 | 45 | < 0.1% | |
| 242438 | 45 | < 0.1% | |
| 209686 | 45 | < 0.1% | |
| 193326 | 45 | < 0.1% | |
| 37730 | 45 | < 0.1% | |
| 406452 | 45 | < 0.1% | |
| 480152 | 45 | < 0.1% | |
| 4976 | 45 | < 0.1% | |
| Other values (924317) | 13130127 | > 99.9% |
| Value | Count | Frequency (%) | |
| 2 | 45 | < 0.1% | |
| 4 | 43 | < 0.1% | |
| 7 | 2 | < 0.1% | |
| 14 | 16 | < 0.1% | |
| 20 | 45 | < 0.1% | |
| 30 | 45 | < 0.1% | |
| 40 | 45 | < 0.1% | |
| 43 | 45 | < 0.1% | |
| 47 | 9 | < 0.1% | |
| 89 | 45 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1423208 | 1 | < 0.1% | |
| 1423207 | 1 | < 0.1% | |
| 1423206 | 1 | < 0.1% | |
| 1423205 | 1 | < 0.1% | |
| 1423204 | 1 | < 0.1% | |
| 1423203 | 1 | < 0.1% | |
| 1423202 | 1 | < 0.1% | |
| 1423201 | 1 | < 0.1% | |
| 1423200 | 1 | < 0.1% | |
| 1423199 | 1 | < 0.1% |
month
Categorical
| Distinct | 45 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 100.2 MiB |
| 2020-10-01 | 347964 |
|---|---|
| 2020-09-01 | 345254 |
| 2020-08-01 | 342678 |
| 2020-07-01 | 340267 |
| 2020-06-01 | 337663 |
| Other values (40) |
| Value | Count | Frequency (%) | |
| 2020-10-01 | 347964 | 2.6% | |
| 2020-09-01 | 345254 | 2.6% | |
| 2020-08-01 | 342678 | 2.6% | |
| 2020-07-01 | 340267 | 2.6% | |
| 2020-06-01 | 337663 | 2.6% | |
| 2020-05-01 | 335065 | 2.6% | |
| 2020-04-01 | 332608 | 2.5% | |
| 2020-03-01 | 330286 | 2.5% | |
| 2020-02-01 | 327755 | 2.5% | |
| 2020-01-01 | 325231 | 2.5% | |
| 2019-12-01 | 322717 | 2.5% | |
| 2019-11-01 | 320025 | 2.4% | |
| 2019-10-01 | 317369 | 2.4% | |
| 2019-09-01 | 315071 | 2.4% | |
| 2019-08-01 | 312588 | 2.4% | |
| 2019-07-01 | 310174 | 2.4% | |
| 2019-06-01 | 307742 | 2.3% | |
| 2019-05-01 | 305000 | 2.3% | |
| 2019-04-01 | 302328 | 2.3% | |
| 2019-03-01 | 299943 | 2.3% | |
| 2019-02-01 | 297348 | 2.3% | |
| 2019-01-01 | 294731 | 2.2% | |
| 2018-12-01 | 292010 | 2.2% | |
| 2018-11-01 | 289481 | 2.2% | |
| 2018-10-01 | 286795 | 2.2% | |
| Other values (20) | 5193159 | 39.5% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 0 | 41015486 | 31.2% | |
| 1 | 27599577 | 21.0% | |
| - | 26262504 | 20.0% | |
| 2 | 18497511 | 14.1% | |
| 9 | 4902759 | 3.7% | |
| 8 | 4517556 | 3.4% | |
| 7 | 3909232 | 3.0% | |
| 6 | 1167518 | 0.9% | |
| 5 | 1156959 | 0.9% | |
| 4 | 1146661 | 0.9% | |
| 3 | 1136757 | 0.9% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 105050016 | 80.0% | |
| Dash Punctuation | 26262504 | 20.0% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 41015486 | 39.0% | |
| 1 | 27599577 | 26.3% | |
| 2 | 18497511 | 17.6% | |
| 9 | 4902759 | 4.7% | |
| 8 | 4517556 | 4.3% | |
| 7 | 3909232 | 3.7% | |
| 6 | 1167518 | 1.1% | |
| 5 | 1156959 | 1.1% | |
| 4 | 1146661 | 1.1% | |
| 3 | 1136757 | 1.1% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 26262504 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 131312520 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 0 | 41015486 | 31.2% | |
| 1 | 27599577 | 21.0% | |
| - | 26262504 | 20.0% | |
| 2 | 18497511 | 14.1% | |
| 9 | 4902759 | 3.7% | |
| 8 | 4517556 | 3.4% | |
| 7 | 3909232 | 3.0% | |
| 6 | 1167518 | 0.9% | |
| 5 | 1156959 | 0.9% | |
| 4 | 1146661 | 0.9% | |
| 3 | 1136757 | 0.9% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 131312520 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 0 | 41015486 | 31.2% | |
| 1 | 27599577 | 21.0% | |
| - | 26262504 | 20.0% | |
| 2 | 18497511 | 14.1% | |
| 9 | 4902759 | 3.7% | |
| 8 | 4517556 | 3.4% | |
| 7 | 3909232 | 3.0% | |
| 6 | 1167518 | 0.9% | |
| 5 | 1156959 | 0.9% | |
| 4 | 1146661 | 0.9% | |
| 3 | 1136757 | 0.9% |
months_since_joined
Real number (ℝ≥0)
| Distinct | 106 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 24.72676878 |
|---|---|
| Minimum | 1 |
| Maximum | 106 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 100.2 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 6 |
| median | 17 |
| Q3 | 37 |
| 95-th percentile | 72 |
| Maximum | 106 |
| Range | 105 |
| Interquartile range (IQR) | 31 |
Descriptive statistics
| Standard deviation | 22.70218206 |
|---|---|
| Coefficient of variation (CV) | 0.9181216626 |
| Kurtosis | 0.4530905045 |
| Mean | 24.72676878 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 1.104203205 |
| Sum | 324693432 |
| Variance | 515.3890704 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 1 | 702582 | 5.4% | |
| 2 | 644194 | 4.9% | |
| 3 | 586733 | 4.5% | |
| 4 | 526036 | 4.0% | |
| 5 | 471271 | 3.6% | |
| 6 | 426669 | 3.2% | |
| 7 | 391525 | 3.0% | |
| 8 | 363110 | 2.8% | |
| 9 | 339933 | 2.6% | |
| 10 | 320380 | 2.4% | |
| 11 | 303274 | 2.3% | |
| 12 | 288230 | 2.2% | |
| 13 | 274392 | 2.1% | |
| 14 | 262089 | 2.0% | |
| 15 | 250702 | 1.9% | |
| 16 | 240239 | 1.8% | |
| 17 | 230669 | 1.8% | |
| 18 | 221430 | 1.7% | |
| 19 | 212851 | 1.6% | |
| 20 | 204892 | 1.6% | |
| 21 | 197393 | 1.5% | |
| 22 | 190301 | 1.4% | |
| 23 | 183561 | 1.4% | |
| 24 | 177180 | 1.3% | |
| 25 | 171119 | 1.3% | |
| Other values (81) | 4950497 | 37.7% |
| Value | Count | Frequency (%) | |
| 1 | 702582 | 5.4% | |
| 2 | 644194 | 4.9% | |
| 3 | 586733 | 4.5% | |
| 4 | 526036 | 4.0% | |
| 5 | 471271 | 3.6% | |
| 6 | 426669 | 3.2% | |
| 7 | 391525 | 3.0% | |
| 8 | 363110 | 2.8% | |
| 9 | 339933 | 2.6% | |
| 10 | 320380 | 2.4% |
| Value | Count | Frequency (%) | |
| 106 | 784 | < 0.1% | |
| 105 | 1694 | < 0.1% | |
| 104 | 2595 | < 0.1% | |
| 103 | 3508 | < 0.1% | |
| 102 | 4397 | < 0.1% | |
| 101 | 5356 | < 0.1% | |
| 100 | 6310 | < 0.1% | |
| 99 | 7279 | 0.1% | |
| 98 | 8281 | 0.1% | |
| 97 | 9293 | 0.1% |
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9683107902 |
|---|---|
| Minimum | 0 |
| Maximum | 9 |
| Zeros | 5010003 |
| Zeros (%) | 38.2% |
| Memory size | 100.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 0.98818099 |
|---|---|
| Coefficient of variation (CV) | 1.020520478 |
| Kurtosis | 1.056525664 |
| Mean | 0.9683107902 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.025474779 |
| Sum | 12715133 |
| Variance | 0.976501669 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 5010003 | 38.2% | |
| 1 | 4799762 | 36.6% | |
| 2 | 2331315 | 17.8% | |
| 3 | 759880 | 5.8% | |
| 4 | 186565 | 1.4% | |
| 5 | 36632 | 0.3% | |
| 6 | 6095 | < 0.1% | |
| 7 | 900 | < 0.1% | |
| 8 | 89 | < 0.1% | |
| 9 | 11 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 5010003 | 38.2% | |
| 1 | 4799762 | 36.6% | |
| 2 | 2331315 | 17.8% | |
| 3 | 759880 | 5.8% | |
| 4 | 186565 | 1.4% | |
| 5 | 36632 | 0.3% | |
| 6 | 6095 | < 0.1% | |
| 7 | 900 | < 0.1% | |
| 8 | 89 | < 0.1% | |
| 9 | 11 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9 | 11 | < 0.1% | |
| 8 | 89 | < 0.1% | |
| 7 | 900 | < 0.1% | |
| 6 | 6095 | < 0.1% | |
| 5 | 36632 | 0.3% | |
| 4 | 186565 | 1.4% | |
| 3 | 759880 | 5.8% | |
| 2 | 2331315 | 17.8% | |
| 1 | 4799762 | 36.6% | |
| 0 | 5010003 | 38.2% |
transaction_count
Real number (ℝ≥0)
| Distinct | 148 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.67165538 |
|---|---|
| Minimum | 0 |
| Maximum | 149 |
| Zeros | 45158 |
| Zeros (%) | 0.3% |
| Memory size | 100.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 12 |
| Q1 | 20 |
| median | 26 |
| Q3 | 38 |
| 95-th percentile | 64 |
| Maximum | 149 |
| Range | 149 |
| Interquartile range (IQR) | 18 |
Descriptive statistics
| Standard deviation | 16.32709408 |
|---|---|
| Coefficient of variation (CV) | 0.5323186467 |
| Kurtosis | 2.109577431 |
| Mean | 30.67165538 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 1.336677752 |
| Sum | 402757236 |
| Variance | 266.5740012 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 21 | 508275 | 3.9% | |
| 22 | 507235 | 3.9% | |
| 23 | 498900 | 3.8% | |
| 20 | 498434 | 3.8% | |
| 24 | 481533 | 3.7% | |
| 19 | 479820 | 3.7% | |
| 25 | 458337 | 3.5% | |
| 18 | 449234 | 3.4% | |
| 26 | 435258 | 3.3% | |
| 17 | 412095 | 3.1% | |
| 27 | 407648 | 3.1% | |
| 28 | 381177 | 2.9% | |
| 16 | 367367 | 2.8% | |
| 29 | 353649 | 2.7% | |
| 30 | 327000 | 2.5% | |
| 15 | 318842 | 2.4% | |
| 31 | 303186 | 2.3% | |
| 32 | 280148 | 2.1% | |
| 14 | 266073 | 2.0% | |
| 33 | 260134 | 2.0% | |
| 34 | 241689 | 1.8% | |
| 35 | 225266 | 1.7% | |
| 13 | 213049 | 1.6% | |
| 36 | 209947 | 1.6% | |
| 37 | 196567 | 1.5% | |
| Other values (123) | 4050389 | 30.8% |
| Value | Count | Frequency (%) | |
| 0 | 45158 | 0.3% | |
| 1 | 16055 | 0.1% | |
| 2 | 16346 | 0.1% | |
| 3 | 17887 | 0.1% | |
| 4 | 21539 | 0.2% | |
| 5 | 25771 | 0.2% | |
| 6 | 32466 | 0.2% | |
| 7 | 40922 | 0.3% | |
| 8 | 52701 | 0.4% | |
| 9 | 69438 | 0.5% |
| Value | Count | Frequency (%) | |
| 149 | 2 | < 0.1% | |
| 148 | 4 | < 0.1% | |
| 147 | 1 | < 0.1% | |
| 146 | 1 | < 0.1% | |
| 145 | 1 | < 0.1% | |
| 143 | 1 | < 0.1% | |
| 141 | 2 | < 0.1% | |
| 140 | 2 | < 0.1% | |
| 139 | 2 | < 0.1% | |
| 138 | 3 | < 0.1% |
| Distinct | 12748860 |
|---|---|
| Distinct (%) | 97.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 169.7882566 |
|---|---|
| Minimum | 0 |
| Maximum | 2037.223285 |
| Zeros | 45158 |
| Zeros (%) | 0.3% |
| Memory size | 100.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 36.8957677 |
| Q1 | 59.18877707 |
| median | 89.30977162 |
| Q3 | 192.5665371 |
| 95-th percentile | 593.2800142 |
| Maximum | 2037.223285 |
| Range | 2037.223285 |
| Interquartile range (IQR) | 133.37776 |
Descriptive statistics
| Standard deviation | 198.0616219 |
|---|---|
| Coefficient of variation (CV) | 1.166521324 |
| Kurtosis | 9.355484017 |
| Mean | 169.7882566 |
| Median Absolute Deviation (MAD) | 40.46852149 |
| Skewness | 2.775731504 |
| Sum | 2229532384 |
| Variance | 39228.40606 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 45158 | 0.3% | |
| 40 | 7816 | 0.1% | |
| 37 | 7713 | 0.1% | |
| 41 | 7703 | 0.1% | |
| 42 | 7685 | 0.1% | |
| 43 | 7680 | 0.1% | |
| 39 | 7643 | 0.1% | |
| 38 | 7521 | 0.1% | |
| 44 | 7345 | 0.1% | |
| 45 | 7294 | 0.1% | |
| 36 | 7210 | 0.1% | |
| 46 | 7196 | 0.1% | |
| 35 | 7158 | 0.1% | |
| 47 | 6893 | 0.1% | |
| 34 | 6822 | 0.1% | |
| 48 | 6783 | 0.1% | |
| 49 | 6589 | 0.1% | |
| 33 | 6452 | < 0.1% | |
| 50 | 6358 | < 0.1% | |
| 51 | 6160 | < 0.1% | |
| 32 | 6013 | < 0.1% | |
| 52 | 5942 | < 0.1% | |
| 53 | 5752 | < 0.1% | |
| 31 | 5606 | < 0.1% | |
| 54 | 5554 | < 0.1% | |
| Other values (12748835) | 12921206 | 98.4% |
| Value | Count | Frequency (%) | |
| 0 | 45158 | 0.3% | |
| 1 | 4032 | < 0.1% | |
| 1.000011627 | 1 | < 0.1% | |
| 1.000888659 | 1 | < 0.1% | |
| 1.001540567 | 1 | < 0.1% | |
| 1.001886487 | 1 | < 0.1% | |
| 1.00192594 | 1 | < 0.1% | |
| 1.00211409 | 1 | < 0.1% | |
| 1.002504755 | 1 | < 0.1% | |
| 1.002552263 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2037.223285 | 1 | < 0.1% | |
| 2023.78753 | 1 | < 0.1% | |
| 2008.624379 | 1 | < 0.1% | |
| 2008.176342 | 1 | < 0.1% | |
| 2005.91362 | 1 | < 0.1% | |
| 1994.490227 | 1 | < 0.1% | |
| 1991.680873 | 1 | < 0.1% | |
| 1991.21774 | 1 | < 0.1% | |
| 1987.945815 | 1 | < 0.1% | |
| 1981.53822 | 1 | < 0.1% |
| Distinct | 12761597 |
|---|---|
| Distinct (%) | 97.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4308.193435 |
|---|---|
| Minimum | 0 |
| Maximum | 67896.16483 |
| Zeros | 45158 |
| Zeros (%) | 0.3% |
| Memory size | 100.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 235.9424685 |
| Q1 | 445.9564406 |
| median | 1272.689167 |
| Q3 | 4899.340984 |
| 95-th percentile | 19322.88185 |
| Maximum | 67896.16483 |
| Range | 67896.16483 |
| Interquartile range (IQR) | 4453.384544 |
Descriptive statistics
| Standard deviation | 6905.088207 |
|---|---|
| Coefficient of variation (CV) | 1.602780449 |
| Kurtosis | 9.091475342 |
| Mean | 4308.193435 |
| Median Absolute Deviation (MAD) | 974.3590553 |
| Skewness | 2.776506826 |
| Sum | 5.657197366e+10 |
| Variance | 47680243.14 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 45158 | 0.3% | |
| 613.3135658 | 365 | < 0.1% | |
| 642.304537 | 357 | < 0.1% | |
| 830.8478751 | 356 | < 0.1% | |
| 768.5342845 | 354 | < 0.1% | |
| 690.1756909 | 350 | < 0.1% | |
| 861.7532421 | 347 | < 0.1% | |
| 739.234615 | 344 | < 0.1% | |
| 567.4826704 | 344 | < 0.1% | |
| 726.9918908 | 342 | < 0.1% | |
| 840.209411 | 340 | < 0.1% | |
| 709.3472378 | 338 | < 0.1% | |
| 510.9961501 | 337 | < 0.1% | |
| 747.6903318 | 335 | < 0.1% | |
| 914.0621986 | 335 | < 0.1% | |
| 759.213929 | 334 | < 0.1% | |
| 595.790321 | 334 | < 0.1% | |
| 699.2759872 | 332 | < 0.1% | |
| 651.8325969 | 332 | < 0.1% | |
| 789.3054814 | 332 | < 0.1% | |
| 883.2970731 | 331 | < 0.1% | |
| 648.3600552 | 331 | < 0.1% | |
| 747.7630876 | 329 | < 0.1% | |
| 523.3129775 | 328 | < 0.1% | |
| 779.1932429 | 328 | < 0.1% | |
| Other values (12761572) | 13077939 | 99.6% |
| Value | Count | Frequency (%) | |
| 0 | 45158 | 0.3% | |
| 6.60911747 | 1 | < 0.1% | |
| 7.732931712 | 4 | < 0.1% | |
| 8.342663711 | 4 | < 0.1% | |
| 8.983601446 | 18 | < 0.1% | |
| 9.654756691 | 22 | < 0.1% | |
| 10.35473063 | 29 | < 0.1% | |
| 10.42832964 | 1 | < 0.1% | |
| 10.56667928 | 1 | < 0.1% | |
| 10.66270117 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 67896.16483 | 1 | < 0.1% | |
| 67447.20261 | 1 | < 0.1% | |
| 66939.13685 | 1 | < 0.1% | |
| 66928.0942 | 1 | < 0.1% | |
| 66850.22455 | 1 | < 0.1% | |
| 66464.80528 | 1 | < 0.1% | |
| 66374.47992 | 1 | < 0.1% | |
| 66361.74222 | 1 | < 0.1% | |
| 66253.85529 | 1 | < 0.1% | |
| 66034.91039 | 1 | < 0.1% |
| Distinct | 13130088 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7889.968914 |
|---|---|
| Minimum | 0 |
| Maximum | 596106.4354 |
| Zeros | 663 |
| Zeros (%) | < 0.1% |
| Memory size | 100.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 59.99981165 |
| Q1 | 299.3680261 |
| median | 1223.010107 |
| Q3 | 5595.998816 |
| 95-th percentile | 39473.54535 |
| Maximum | 596106.4354 |
| Range | 596106.4354 |
| Interquartile range (IQR) | 5296.63079 |
Descriptive statistics
| Standard deviation | 20170.91154 |
|---|---|
| Coefficient of variation (CV) | 2.55652611 |
| Kurtosis | 49.23429765 |
| Mean | 7889.968914 |
| Median Absolute Deviation (MAD) | 1110.968521 |
| Skewness | 5.792566483 |
| Sum | 1.036051701e+11 |
| Variance | 406865672.3 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 663 | < 0.1% | |
| 7.999999992 | 22 | < 0.1% | |
| 6.999999993 | 15 | < 0.1% | |
| 32 | 14 | < 0.1% | |
| 36 | 11 | < 0.1% | |
| 38 | 10 | < 0.1% | |
| 69 | 9 | < 0.1% | |
| 9.99999999 | 8 | < 0.1% | |
| 40 | 8 | < 0.1% | |
| 8.999999991 | 8 | < 0.1% | |
| 30 | 7 | < 0.1% | |
| 32 | 7 | < 0.1% | |
| 13.99999998 | 6 | < 0.1% | |
| 81 | 6 | < 0.1% | |
| 34 | 6 | < 0.1% | |
| 72 | 6 | < 0.1% | |
| 40 | 6 | < 0.1% | |
| 16.99999997 | 5 | < 0.1% | |
| 15.99999998 | 5 | < 0.1% | |
| 42 | 5 | < 0.1% | |
| 28 | 5 | < 0.1% | |
| 66 | 5 | < 0.1% | |
| 36 | 4 | < 0.1% | |
| 71.99999993 | 4 | < 0.1% | |
| 140.9999999 | 4 | < 0.1% | |
| Other values (13130063) | 13130403 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 663 | < 0.1% | |
| 0.001331510022 | 1 | < 0.1% | |
| 0.004582775418 | 1 | < 0.1% | |
| 0.00735235792 | 1 | < 0.1% | |
| 0.009273340173 | 1 | < 0.1% | |
| 0.01139566639 | 1 | < 0.1% | |
| 0.01539847728 | 1 | < 0.1% | |
| 0.01702100338 | 1 | < 0.1% | |
| 0.01871016156 | 1 | < 0.1% | |
| 0.02341073477 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 596106.4354 | 1 | < 0.1% | |
| 587656.1896 | 1 | < 0.1% | |
| 578674.1918 | 1 | < 0.1% | |
| 569285.0824 | 1 | < 0.1% | |
| 559229.0461 | 1 | < 0.1% | |
| 552016.9758 | 1 | < 0.1% | |
| 549022.2706 | 1 | < 0.1% | |
| 543476.9792 | 1 | < 0.1% | |
| 539503.9764 | 1 | < 0.1% | |
| 538662.6931 | 1 | < 0.1% |
| Distinct | 3249 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 365.3978279 |
|---|---|
| Minimum | 0 |
| Maximum | 13534 |
| Zeros | 115064 |
| Zeros (%) | 0.9% |
| Memory size | 100.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 15 |
| Q1 | 50 |
| median | 108 |
| Q3 | 318 |
| 95-th percentile | 1679 |
| Maximum | 13534 |
| Range | 13534 |
| Interquartile range (IQR) | 268 |
Descriptive statistics
| Standard deviation | 726.1654496 |
|---|---|
| Coefficient of variation (CV) | 1.987328315 |
| Kurtosis | 28.1180625 |
| Mean | 365.3978279 |
| Median Absolute Deviation (MAD) | 74 |
| Skewness | 4.515913938 |
| Sum | 4798130958 |
| Variance | 527316.2601 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 36 | 294812 | 2.2% | |
| 72 | 265280 | 2.0% | |
| 34 | 264388 | 2.0% | |
| 38 | 258962 | 2.0% | |
| 32 | 240499 | 1.8% | |
| 66 | 237516 | 1.8% | |
| 69 | 234240 | 1.8% | |
| 40 | 222419 | 1.7% | |
| 75 | 202892 | 1.5% | |
| 30 | 199810 | 1.5% | |
| 42 | 191644 | 1.5% | |
| 63 | 190786 | 1.5% | |
| 60 | 188311 | 1.4% | |
| 78 | 166323 | 1.3% | |
| 84 | 162057 | 1.2% | |
| 120 | 145934 | 1.1% | |
| 28 | 127468 | 1.0% | |
| 108 | 122536 | 0.9% | |
| 81 | 120952 | 0.9% | |
| 96 | 116554 | 0.9% | |
| 44 | 115419 | 0.9% | |
| 0 | 115064 | 0.9% | |
| 100 | 114642 | 0.9% | |
| 48 | 109061 | 0.8% | |
| 57 | 108197 | 0.8% | |
| Other values (3224) | 8615486 | 65.6% |
| Value | Count | Frequency (%) | |
| 0 | 115064 | 0.9% | |
| 2 | 3467 | < 0.1% | |
| 3 | 7013 | 0.1% | |
| 4 | 10191 | 0.1% | |
| 5 | 12077 | 0.1% | |
| 6 | 15995 | 0.1% | |
| 7 | 18069 | 0.1% | |
| 8 | 27420 | 0.2% | |
| 9 | 32517 | 0.2% | |
| 10 | 50869 | 0.4% |
| Value | Count | Frequency (%) | |
| 13534 | 1 | < 0.1% | |
| 13433 | 1 | < 0.1% | |
| 13200 | 3 | < 0.1% | |
| 13068 | 1 | < 0.1% | |
| 12969 | 3 | < 0.1% | |
| 12870 | 2 | < 0.1% | |
| 12838 | 5 | < 0.1% | |
| 12740 | 12 | < 0.1% | |
| 12642 | 6 | < 0.1% | |
| 12610 | 1 | < 0.1% |
cluster
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 100.2 MiB |
| B | |
|---|---|
| C | |
| A |
| Value | Count | Frequency (%) | |
| B | 6644458 | 50.6% | |
| C | 3927959 | 29.9% | |
| A | 2558835 | 19.5% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Most occurring characters
| Value | Count | Frequency (%) | |
| B | 6644458 | 50.6% | |
| C | 3927959 | 29.9% | |
| A | 2558835 | 19.5% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Uppercase Letter | 13131252 | 100.0% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| B | 6644458 | 50.6% | |
| C | 3927959 | 29.9% | |
| A | 2558835 | 19.5% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 13131252 | 100.0% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| B | 6644458 | 50.6% | |
| C | 3927959 | 29.9% | |
| A | 2558835 | 19.5% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 13131252 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| B | 6644458 | 50.6% | |
| C | 3927959 | 29.9% | |
| A | 2558835 | 19.5% |
churned
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.5 MiB |
| False | |
|---|---|
| True | 590775 |
| Value | Count | Frequency (%) | |
| False | 12540477 | 95.5% | |
| True | 590775 | 4.5% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| customer_id | month | months_since_joined | referred_friends | transaction_count | transaction_value | clv | total_reward_points | earned_reward_points | cluster | churned | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 2017-02-01 | 62 | 1 | 58.0 | 743.785020 | 24461.876134 | 41822.999177 | 2146.0 | B | False |
| 1 | 4 | 2017-02-01 | 62 | 1 | 17.0 | 185.170485 | 6089.955209 | 9374.999673 | 153.0 | B | False |
| 2 | 7 | 2017-02-01 | 62 | 1 | 71.0 | 71.000000 | 2335.074193 | 14302.040625 | 213.0 | B | False |
| 3 | 14 | 2017-02-01 | 62 | 0 | 25.0 | 187.986119 | 6182.556815 | 3200.996803 | 225.0 | C | False |
| 4 | 20 | 2017-02-01 | 62 | 0 | 63.0 | 342.682994 | 11270.284723 | 31744.997234 | 1071.0 | B | False |
| 5 | 30 | 2017-02-01 | 62 | 0 | 48.0 | 247.572585 | 8142.258498 | 16286.604891 | 576.0 | C | False |
| 6 | 40 | 2017-02-01 | 62 | 0 | 20.0 | 82.002807 | 2696.938577 | 2939.997050 | 80.0 | C | False |
| 7 | 43 | 2017-02-01 | 62 | 2 | 15.0 | 174.113245 | 5726.300627 | 5386.062256 | 120.0 | B | False |
| 8 | 47 | 2017-02-01 | 62 | 3 | 69.0 | 400.679664 | 13177.700606 | 59622.632574 | 1380.0 | B | False |
| 9 | 89 | 2017-02-01 | 62 | 2 | 37.0 | 189.368177 | 6228.010481 | 8602.997289 | 333.0 | A | False |
Last rows
| customer_id | month | months_since_joined | referred_friends | transaction_count | transaction_value | clv | total_reward_points | earned_reward_points | cluster | churned | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 13131242 | 1423199 | 2020-10-01 | 1 | 0 | 27.0 | 72.919534 | 345.643473 | 79.222762 | 81.0 | B | False |
| 13131243 | 1423200 | 2020-10-01 | 1 | 2 | 21.0 | 59.631172 | 282.655748 | 41.994693 | 42.0 | B | False |
| 13131244 | 1423201 | 2020-10-01 | 1 | 1 | 20.0 | 54.684560 | 259.208475 | 39.998835 | 40.0 | C | False |
| 13131245 | 1423202 | 2020-10-01 | 1 | 0 | 17.0 | 47.206802 | 223.763401 | 33.999995 | 34.0 | C | False |
| 13131246 | 1423203 | 2020-10-01 | 1 | 1 | 13.0 | 33.609168 | 159.309706 | 12.489877 | 13.0 | B | True |
| 13131247 | 1423204 | 2020-10-01 | 1 | 0 | 26.0 | 72.892389 | 345.514806 | 77.987312 | 78.0 | B | True |
| 13131248 | 1423205 | 2020-10-01 | 1 | 0 | 19.0 | 52.500548 | 248.856114 | 37.988428 | 38.0 | B | False |
| 13131249 | 1423206 | 2020-10-01 | 1 | 2 | 19.0 | 55.659544 | 263.829967 | 37.999962 | 38.0 | C | False |
| 13131250 | 1423207 | 2020-10-01 | 1 | 1 | 22.0 | 63.473847 | 300.870287 | 65.999949 | 66.0 | B | False |
| 13131251 | 1423208 | 2020-10-01 | 1 | 2 | 22.0 | 65.412040 | 310.057450 | 65.999415 | 66.0 | C | False |